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DETAILED ACTION 
Specification 

1 . The title of tlie invention is not descriptive. A new title is required that is clearly 
indicative of the invention to which the claims are directed. 

The following title is suggested: Speech Synthesis for Synthesizing Missing Parts 

2. The disclosure is objected to because of the following informalities: 

On page 29, lines 2 to 3; on page 29, line 12; on page 29, line 18; on page 48, 
line 20; on page 48, line 25; and on page 49, line 1, shouldn't "decompression section 
43" should be "decompression section 8"? The Specification talks about data supplied 
by search section 6, so it appears more likely that decompression section 8 would be 
involved than decompression section 43, which is closer to search section 42. (See 
Figures 1 and 3.) 

On page 41, line 24, there should be a period at the end of the sentence. 
On page 68, line 17, "stringent that" should be "stringent than". 
On page 74, line 5, "stringent that" should be "stringent than". 
Appropriate correction is required. 



Claim Objections 

3. Claims 29 and 41 are objected to because of the following informalities: 
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Claims 29 and 41 appear to be duplicate claims. Both depend from claim 25, 
and both appear to set forth identical subject matter. 
Appropriate correction is required. 



Claim Rejections - 35 USC § 101 

4. 35 U.S.C. 101 reads as follows: 

Whoever invents or discovers any new and useful process, machine, manufacture, or composition of 
matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the 
conditions and requirements of this title. 

5. Claims 37 and 38 are rejected under 35 U.S.C. 101 because the claimed 
invention is directed to non-statutory subject matter. 

Independent claims 37 and 38 are directed to non-statutory subject matter 
because they set forth a computer program without being recorded in a computer- 
readable medium. The USPTO takes the position that computer programs, perse, 
represent non-statutory subject matter. Applicant should amend independent claims 37 
and 38 to include a computer-readable medium having program instructions for storing 
a computer program. 



Claim Rejections - 35 USC § 102 

6. The following is a quotation of the appropriate paragraphs of 35 U.S.C. 1 02 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - 

(a) the invention was known or used by others in this country, or patented or described in a printed 
publication in this or a foreign country, before the invention thereof by the applicant for a patent. 
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7. Claims 23, 35, and 37 are rejected under 35 U.S.C. 102(a) as being anticipated 
by Reynar et al. 

Regarding independent claims 23, 35, and 37, Reynar et al. discloses a speech 
synthesis device, method, and computer program, comprising: 

"a first storage means for storing a plurality of pieces of voice unit data 
representative of one or more speech words" - stored audio data 270 is a long-term 
storage medium for converting speech input 290 from a speech recognition program 
240; stored audio data 270 may later be accessed for audio playback (column 9, lines 5 
to 10: Figure 2); 

"a selection means for selecting voice unit data whose reading is common with a 
speech word composing inputted sentence information from the plurality of pieces of 
voice unit data stored in the first storage means" - if multi-source input and playback 
utility 200 determines that stored audio data 270 is linked to a word, then the utility 
retrieves this audio data; a user selects a text portion of a document which he desires 
the multi-source input and playback utility to play; the multi-source input and playback 
utility 200 determines whether the word is linked to stored audio data 270 saved from a 
previous dictation session (column 1 1 , lines 33 to 56: Figure 4: Steps 41 0 and 41 5); 

"a missing part synthesis means, for a speech word among the sentence 
information for which the selection means could not select the voice unit data, for 
synthesizing speech data representative of a desired speech waveform" - alternately, 
utility 200 may determine that no speech is linked to the word; in this event, the utility 
checks for the existence of a TTS entry 220 corresponding to the current word; if such a 
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TTS entry 220 exists, tlie TTS module 137 retrieves tine TTS entry and returns it to tlie 
word processor 210 (column 12, lines 9 to 27: Figure 4: Steps 410, 425, 430, and 440); 

"a synthesis means for combining the voice unit data selected from the selection 
means and the speech data synthesized by the missing part synthesis means to create 
data representative of a synthesis speech corresponding to the sentence information" - 
word processor 210 parses each word within the text selection in turn, and retrieves and 
plays either stored audio data 270 or a TTS entry 220; to a user of the multi-source 
input and playback utility 200, a continuous stream of mixed stored audio data and TTS 
entries is heard, sounding out the text selection (column 10, lines 43 to 50: Figure 2). 

"wherein the missing part synthesis means has a second storage means for 
storing a plurality of pieces of data representative of one or more pitches of voice 
waveform fragments" - optionally, the audible characteristics of the TTS entry 220, such 
as pitch, tone, and speech, may be manipulated by the utility prior to playback in order 
to more closely match the sound of the TTS entry to that of the stored audio data 
(column 12, lines 31 to 35: Figure 4); implicitly, an TTS entry will have at least "one 
pitch", which can then be manipulated prior to playback, and is stored in a TTS entry 
database 220 ("a second storage means") (Figure 2); 

"wherein data representative of voice waveform fragments composing the 
speech word whose voice unit data could not be selected is acquired from the second 
storage means and the acquired data is mutually combined to synthesize the speech 
data representative of the desired speech waveform" - word processor 210 parses 
each word within the text selection in turn, and retrieves and plays either stored audio 
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data 270 or a TTS entry 220; to a user of the multi-source input and playback utility 200, 
a continuous stream of mixed stored audio data and TTS entries is heard, sounding out 
the text selection (column 10, lines 43 to 50: Figure 2). 

Claim Rejections - 35 USC § 103 

8. The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary sl<ill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

9. Claims 24 to 29, 34, 36, and 38 to 41 are rejected under 35 U.S.C. 103(a) as 
being unpatentable over Reynar et al. in view of Kato et al. (EP '072). 

Concerning independent claims 34, 36, and 38, Reynar etal. discloses a speech 
synthesis device, method, and computer program, comprising a first storage means, a 
selection means, and a missing part synthesis means of independent claims 23, 35, and 
37, but does not expressly disclose the limitations of "wherein the first storage means 
stores phonetic data representative of a reading of the voice unit data with the phonetic 
data being associated with the voice unit data, and wherein the selection means 
operates to handle voice unit data which is associated with the phonetic data 
representative of a reading matching with the reading of a speech word composing the 
sentence information as voice unit data whose reading is common with the speech 
word." Reynar et al. suggests that stored audio data 270 corresponds to phonetic data 
from speech recognition. (Column 9, Lines 2 to 10) However, Reynar etal. doesn't 
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disclose choosing a voice unit from a plurality of alternative voice units that is 
associated with a desired reading of a speech word in the context of a sentence. Still, 
Kato et al. (EP '072) teaches a speech synthesizing system and speech synthesizing 
method, where speech synthesis is performed taking into account a construction of a 
sentence. (1|[0007] - 1f[0008]) A prosodic data retrieving section 140 searches prosodic 
data stored in prosodic information database 130 in response to output from language 
processing section 1 20, and outputs the search result. The retrieval keys that match 
the search key to a certain degree are selected as retrieval candidates, and of the 
selected candidates, the key having the highest degree of matching is selected. 
(1I[0062] - ^[0063]) Prosodic information corresponds to "phonetic data representative 
of a reading of the voice data unit". An objective is to provide a speech synthesis 
system capable of generating natural sounding speech from arbitrary input texts having 
good sound quality. (1|[0009]) It would have been obvious to one having ordinary skill 
in the art to store phonetic data representative of a reading of the voice unit data so as 
to match a reading of a word in a sentence by prosody as taught by Kato et al. (EP 
'072) in a multi-source input and playback utility o1 Reynaret al. for a purpose of 
generating natural sounding speech having good sound quality. 

Concerning claims 24 to 27 and 39, Kato etal. (EP '072) teaches matching 
prosody. (1[[0062] - 1|[0063]) Prosody is equivalent to "cadence". The retrieval keys 
that match the search key to a certain degree are selected as retrieval candidates, and 
of the selected candidates, the key having the highest degree of matching is selected. 
Implicitly, those candidates that do not have the highest degree of matching are 
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excluded ("to exclude from the objects of selection voice unit data whose cadence does 
not match with the cadence prediction result under the predetermined conditions"). 

Concerning claims 28 to 29 and 40 to 41 , Reynar et al. discloses audio 
characteristics include pitch, tone, and speed. (Column 12, lines 31 to 35) Kato etal. 
(EP '072) teaches that prosodic information database 130 stores a fundamental 
frequency pattern, and prosodic data retrieval section 140 retrieves a fundamental 
frequency pattern having the highest match. (1|[0062] - 1|[0063]) Prosody is equivalent 
to "cadence", and a fundamental frequency pattern corresponds to "a time variation in 
pitch" because the fundamental frequency is the same as "pitch", and the pattern 
corresponds to its time evolution. See Figures 2 to 4 of Kato et al. (EP '072). 

1 0. Claims 30 to 33 are rejected under 35 U.S.C. 1 03(a) as being unpatentable over 
Reynar et al. in view of Kato et al. (EP '072) as applied to claims 23 to 25 above, and 
further in view of Chihara. 

Reynar et al. discloses audio characteristics include pitch, tone, and speed. 
(Column 12, lines 31 to 35) However, Reynar etal. suggests manipulating the speed 
characteristics of a TTS entry, but does not expressly say that utterance speed 
conversion means acquires utterance speed data specifying conditions, selecting or 
converting speech data and/or voice unit data at a speed fulfilling the specified 
conditions, and eliminating or adding segments by the utterance speed conversion 
means. Still, Chihara teaches a method of controlling high-speed reading in a text-to- 
speech conversion system, where control factors are required to predict a duration 
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length of each phoneme or word. The prediction uses pieces of information such as the 
phoneme, the l<ind of adjacent phonemes, the number of mora in the phrase, and the 
position in the sentence, which are sent to a duration estimation section. The predicted 
result is sent to a duration correcting section to correct the predicted value where the 
user designates the utterance speed. (Column 5, Lines 34 to 67: Figure 20) At a high 
utterance speed, a number of superimposed voice segments is subtracted ("by 
eliminating a segment") to make the waveform, and at a low utterance speed, the 
number of superimposed segments is repeated ("adding a segment") for making the 
waveform. (Column 6, Lines 1 to 11: Figure 21) An objective is to control reading 
speed from a phoneme and prosody character string including accent and intonation. 
(Column 1 , Lines 19 to 28: Figure 15) It would have been obvious to one having 
ordinary skill in the art to provide utterance speed conversion at a speed fulfilling 
specified conditions as taught by Chihara in a multi-source input and playback utility of 
Reynar et al. for a purpose of controlling reading speed from a prosody character string 
including accent and intonation. 

Conclusion 

1 1 . The prior art made of record and not relied upon is considered pertinent to 
Applicant's disclosure. 

Kato et al. ('309) is the equivalent in the United States of Kato et al. (EP '072). 

Kondo et al., Ohtsuka et al., Yamada, Imai et al., Nishimura et al., Kasai et al. 
('530), and Kasai et al. ('962) disclose related art. 
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Any inquiry concerning tliis communication or earlier communications from the 
examiner should be directed to MARTIN LERNER whose telephone number is 
(571)272-7608. The examiner can normally be reached on 8:30 AM to 6:00 PM 
Monday to Thursday. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, David R. Hudspeth can be reached on (571 ) 272-7843. The fax phone 
number for the organization where this application or proceeding is assigned is 571- 
273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more Information about the PAIR system, see http://palr-dlrect.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

/Martin Lerner/ 
Primary Examiner 
Art Unit 2626 
June 11, 2009 



